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Abstract 

Calculations of sensitivities of future experiments are a necessary ingredient in ex- 
perimental high energy physics. Especially in the context of measurements of the 
neutrino oscillation parameters extensive studies are performed to arrive at the 
optimal configuration. In this note we clarify the definition of sensitivity as often 
applied in these studies. In addition we examine two of the most common methods 
to calculate sensitivity from a statistical perspective using a toy model. The impor- 
tance of inclusion of uncertainties in nuisance parameters for the interpretation of 
sensitivity calculations is pointed out. 

Key words: sensitivity, statistical methods, neutrino oscillation experiments 
PACS: 29.90.+r,14.60.Pq 



1 Introduction 

In the process of developing experiments measuring new phenomena in physics 
the estimation of the sensitivity of certain configurations of experiments is of 
utmost importance. 

A particularly active field is the estimate of sensitivities for future neutrino 
oscillation experiments, see for example [T]. The goal of future experiments is 
often the measurement of a mixing angle (denoted 613), the probability for a 
neutrino oscillation taking place being proportional to sin 2 20i3. 

In sensitivity studies for neutrino oscillation experiments, "sensitivity" is often 
not denned in the same way. Furthermore uncertainties in nuisance parame- 



Preprint submitted to Elsevier 



2 February 2008 



ters (often sloppily called "systematic uncertainties'tiJ) are often ignored (see 
for example [3lH] ) or included in calculations in an incomplete manner (see 
discussion in [5]). Little attention seems to be given to the issue of how sensi- 
tivity is defined and uncertainties are treated, despite the fact that decisions on 
experimental set-ups might be based on small differences in sensitivity studies. 

In this note we try to firstly clarify the definition of "sensitivity" and discuss 
- using a Toy model - potential problems which arise if instrumental uncer- 
tainties need to be considered. 

The issue of sensitivity calculation has after being discussed on a recent con- 
ference on future neutrino experiments [5] already inspired a more careful 
assessment of sensitivity calculation [6] which indicates that a more formal 
discussion is worthwhile. 



2 Definitions of sensitivity 

Probably the most common definition of sensitivity adopted in the study of fu- 
ture experiments aimed at the discovery of signals of as yet undetected physics 
phenomena is: 

The experiment is said to be sensitive to a given value of the parameter 
@ 13 = <d^ ns at significance level a if the mean p-value obtained given Ql^ ns is 
smaller than a. 

Here we choose (in the spirit of neutrino oscillation experiments) the parameter 
describing the new physics phenomenon to be denoted by 0i3. The p-value is 
(per definition) calculated under the condition that the null hypothesis holds: 

P = P(T>t obs \H :Q 13 = 0) (1) 

where H denotes the null hypothesis and T denotes the test statistics with 
its observed value t obs , which is distributed as the distribution function P. We 
will give common definitions of T in the next section. 



1 we refer the reader to [2] for a discussion on how to classify uncertainties in 
nuisance parameters 
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A variation, which is the most commonly used in the context of neutrino ex- 
periments is using confidence intervals for the definition of sensitivity: 



The experiment is said to be sensitive to a given value of the parameter 
13 = 0*3 ns at significance level a if the mean 1 — a confidence interval 
obtained, given QH ns , does not contain 613 = 



Both definitions can be equivalent, but are not in general, depending on the 
choice of test statistics and the method of confidence interval calculation. For 
example, choosing to calculate upper limits would never yield detection. 



To our knowledge, all sensitivity curves presented for neutrino oscillation ex- 
periments follow above definitions and therefore correspond to the mean obser- 
vation. This has two important implications: firstly, it should be emphasized 
that if the distribution of parameter estimates is Gaussian, it is a well known 
fact (but often ignored) that even if Q\^ e = QH ns (i.e. the true value is at the 
estimated sensitivity), the probability for actually claiming discovery is only 
50 %. Secondly, if the parameter estimates are not Gaussian (see for example 
[S]), then the presentation of the mean result yields very little information on 
the actual probability for discovery. Thus, the choice to present mean exper- 
imental results is not particularly informative. A more general definition of 
sensitivity therefore has to specify two probabilities: 



The experiment is sensitive to a given value of O13 = @^| ns if the probability 
of obtaining an observation n which rejects 613 = with at least significance 
a is at least (3. 



where we have chosen to reformulate the definition as to be applicable to both 
definitions given above. Another example where an attempt is made to define 
sensitivity in a manner involving the detection probability can be found in [7] 



3 Claiming discovery and calculating confidence intervals 



According to the Neyman-Pearson lemma, the uniformly most powerful test 
statistics that can be chosen is the likelihood ratio: 

™ £(n\H ) 
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where T denotes the test statistics and C(n\H) denotes the likelihood under 
the observation n for the null hypothesis (H ) and the alternative hypothesis 
(Hi), respectively. One useful property of the likelihood ratio is the fact that 
asymptotically: 

-21nT~ X 2 (3) 



i.e. the distribution of T under the null hypothesis is known and the sig- 
nificance of the observation can be calculated from the x 2 distribution. A 
particular common method in studies of neutrino oscillation experiments is to 
perform a % 2 fit and calculate a confidence interval from the function X 2 (@i3)- 
For example, the interval [©13™, ©if] can be found by finding the points for 
which: 

X 2 (Qi3)-X 2 min = 2.706 (4) 

where xLin denotes the x 2 at the best fit value of ©13. This confidence interval 
is then often used to claim discovery by requiring ©^ > 0. 

There are two quantities which are of crucial importance in the context of 
calculation of confidence intervals and in the testing of hypothesis (claiming 
of discovery). Methods to calculate confidence intervals should have coverage, 
defined as: 



An algorithm is said to have the correct coverage if given a confidence level 
1 — a and a large number of repeated identical experiments, the resulting con- 
fidence intervals include the true value of the parameters to be estimated in a 
fraction 1 — a of all experiments. 

If confidence intervals are used to claim discovery (meaning for testing hy- 
potheses), then a is the probability for making a type I error, i.e. the probabil- 
ity for rejecting the null hypothesis though it is true (often called significance) . 

The other quantity we will be interested in is the power. Power is the proba- 
bility that the null hypothesis is rejected given that the alternative hypothesis 
is true. This quantity is exactly the probability we denoted (3 in the previ- 
ous section. The probability 1 - (3 is the probability to make a type II error 
(accepting the null hypothesis though it is false). 
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4 Including uncertainties in nuisance parameters 

Nuisance parameters are parameters which enter the data model, but which 
are not of prime interest. The probably most common example is the expected 
background in a Poisson process. Sensitivities (as confidence intervals) are usu- 
ally only calculated for the parameter of primary interest and it is not desired 
to calculate them depending on parameters which are of no physical interest 
and specific to the experiment. Thus, ways have to be found to marginalize 
the nuisance parameter. There are two particularly common approaches: 
In the first method, the probability density function (PDF) without uncer- 
tainty in nuisance parameters is replaced by one where there is an integration 
over all possible true values of the nuisance parameter (integration method): 



Here b trU e is the true value of the nuisance parameter and b est is its estimate. 
Since the integrated PDF is describing the probability of the true value given 
its estimate (and not vice versa) this method is Bayesian. Some prior probabil- 
ity distribution of the true value of the nuisance parameter has to be assumed. 
In the other common method, the PDF is replaced by one where for each s the 
PDF is maximized with respect to the nuisance parameters (profiling method) 

P(n\s,b true ) — > max £(n\s,b true ) (6) 



with notation as above. This method is completely frequentist, since it never 
treats bt rU e as a random variable. Therefore the argument of the maximization 
is a likelihood function and not a PDF. Both these methods are frequently 
applied in high energy physics in confidence interval calculations [B],[S] and 
references therein and to them. 

In assessment of sensitivities of neutrino oscillation experiments, uncertainties 
are often included by performing a least square fit using a modified \ 2 and 
use the resulting confidence interval (see previous section) to claim discovery. 
Two modifications are particularly common. One method is to add the uncer- 
tainty in the background estimate in quadrature (for simplicity we will restrict 
ourselves to background estimate uncertainties): 



CO 




(5) 








/(.add 




(7) 
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where n b s denotes the experimental result, b est the background estimate and 
of the uncertainty on that estimate. Under assumption of a Gaussian process 
and applying Bayesian reasoning, x\dd can be viewed as equivalent to using 
the method illustrated in equation 



The other (probably more common) method of inclusion is based on adding 
a normalization parameter to the x 2 an d minimize the x 2 with respect to it, 
see for example [T0l|llj|12] 

2 _ ( (n obs - Ab est f (A-lf \ 



where in addition to the parameters described above, we introduce the nor- 
malization parameter A. This modification is equivalent to the method repre- 
sented by equation [6] under assumption of Gaussian processes. 



A priori it can not be assumed that the modified quantities xf>rof an d Xadd still 
follow a x 2 distribution. Howe ver, in general, this is the assumption employed 
in sensitivity calculations 2 I 3 . In the following section we will apply above 



definitions to a toy model and check the validity of the assumption using 
Monte Carlo simulations. 



5 Testing the x 2 method with a Toy model 



For simplicity, we will consider a one bin measurement, where we measure a 
number of events from a Poisson process with background contribution and 
we obtain an estimate of the background from a separate measurement, which 
is assumed to be Gaussian. In equations: 

n~Po(s + b); b est ~G(b,a b ) (9) 



where Po(s + b) denotes a Poisson process with experimental outcome n (num- 
ber of events), signal parameter s and background parameter b and G(b,ab) 
denotes a Gaussian process with experimental outcome b est and width a b . Since 
in the common neutrino oscillation experiment s oc sin 2 2B13 this Toy model 

2 If uncertainties are ignored, the x 2 is obviously not modified, in our simple ex- 
ample: x 2 = ^ssil , However, its distribution under the null hypothesis is not x 2 
since b es t is not constant at the true value, but a random variable. 

3 The only exception known to us is |6j 
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captures the main feature of many experiments (though being a simplification, 
obviously) . 



Using Monte Carlo simulations of replica of the actual experiment, we can 
calculate the true distribution of the test statistics defined in equation [7J and 
|8] under the condition that the null hypothesis is true (s = 0), thus the cov- 
erage. We can also assume s > and calculate the probability that the null 
distribution will be rejected given the alternative hypothesis is true, i.e. the 
power. 



Figures [fl [2] and [3] exemplify the results. Figure [T] shows the value of the 
modified x 2 s as a function of corresponding coverage. Results are shown for a 
true background of b true = 10 and an uncertainty in the background estimate 
of 20 %. For this very simple example, it can clearly be seen that ignoring 
uncertainties (in this case in the background estimate) leads to a increased 
rate of false detections with respect to the one the experimenter intents. The 
real false detection rate for 99 % nominal threshold for example is larger by a 
factor ~3. The effect becomes smaller if one decides to include the additional 
uncertainties in one of the two ways described in equation [7] and [HJ Using the 
latter for example the false detection rate increases by 50 % with respect to 
that nominally required. 



Though we are assuming a background of bt rue — 10, part of the found differ- 
ence could be due to the fact that we use x 2 statistics for a Poisson process. We 
therefore include the case where we assume a strictly Gaussian measurement 
process (see fig. dj right panel). The difference between the methods becomes 
less pronounced, but is still large. 



In figure[2]we show results for smaller uncertainties in the background estimate 
(10 %). As intuitively expected, the impact of the method chosen to calculate 
the significance becomes less important. If we consider a truely Gaussian pro- 
cess (right panel) for smaller uncertainties both the method using quadratic 
addition and profiling give results compatible with a nominal \ 2 distribution. 

The complete measurement process consists of a measurement of background 
and a measurement of signal events. A complete x 2 is therefore: 



2 _ (P'obs btrue) . [best btrue) < m . 

Xemp 7 ' 2 \ / 

Otrue &b 

and xlrnp ~ X 2 (.2d.o.f.). This quantity is included in the figures. 
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Fig. 1. Quantiles of the distribution of x 2 (Nominal 1 d.o.f.), x 2 a dd (Q ua d- add.), Xp ro f 
(Profile) and the x 2 where uncertainties in the background are ignored (Ignore) . The 
curve labeled "empirical" shows the x 2 where both the number of events and the 
background are considered as measurements. 

The left panel shows the results for the Poisson measurement process. The 
right panel assumes a Gaussian process. The uncertainties in the background 
estimate are assumed to be 20 %. 



In figure [3] we show the relative difference in power between the quadratic 
addition and the profile method. For large signals the power non unexpectedly 
approaches one, i.e. the method used to calculate the test statistics does not 
matter. For low signals however one sees that the power of the profile method is 
up to 35 % larger than for the method of adding the uncertainty in quadrature. 



5.1 Remark on the ensemble of experiment replica 



In our simple toy model the set of measurements is (n, b est ) and since we know 
the distribution of these measurements the ensemble of experiment replica is 
easily constructed. Under more realistic experimental conditions many differ- 
ent both correlated and uncorrelated nuisance parameter might need to be 
considered. This might become computationally very cumbersome, for exam- 
ple if full detector simulations need to be employed. Sometimes even uncertain- 
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Fig. 2. Quantiles of the distribution of x 2 (Nominal 1 d.o.f.), x 2 a dd (Q ua d. add.), Xp ro f 
(Profile) and the x 2 where uncertainties in the background are ignored (Ignore) . The 
curve labeled "empirical" shows the x 2 where both the number of events and the 
background are considered as measurements. The left panel shows the results for 
the Poisson measurement process. The right panel assumes a Gaussian process. The 
uncertainties in the background estimate are assumed to be 10 %. 

ties in theoretical estimates have to be considered. It seems doubtful, though 
possibly the only feasible way, to treat these as random variables. 



6 Summary & Conclusions 

Two subjects have been discussed in this note: 
• The interpretation of "sensitivity" 

Usually estimates of sensitivity are based on an average experimental result. 
For a Gaussian distribution of estimates, this implies that if Q 13 = 0^ ns 
the probability for claiming discovery will be only 50 %. This is a well 
known, but often ignored fact. In our experience, many physicists have the 
notion that if the value of the true value of the parameter is indeed equal 
to the sensitivity, then it should be very likely that a discovery will be made. 

If the distribution of estimates is not Gaussian, in general no statement 
about the probability of detection is made if only the average experimental 
result is presented. Consequently, statements about the probability for de- 
tection should be included in presentation of sensitivity estimates. They can 
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Fig. 3. The relative difference between the power of Xp ro f an d X a dd as a function of 
true signal parameter. Here the true background was assumed to be bt rue = 10. 

be calculated if the distribution of estimates is known or can be simulated. 

• Effect of uncertainties 

The results of the toy model calculation show that if uncertainties in nui- 
sance parameters are included into the calculation of sensitivities (and mea- 
surement results) extra care has to be taken to make sure statistical state- 
ments (like the significance of a discovery, or confidence level of an interval) 
are still valid. The largest mistake is not surprisingly made if the uncertain- 
ties are ignored. The choice of method to include the systematics further- 
more affects the probability of making a discovery. In addition, in presence 
of sizable instrumental uncertainties, the ensemble of experiments for cal- 
culating significance and power needs to be carefully defined. 

When comparing sensitivity estimates for different experiments and experi- 
mental configurations, differences therefore certainly could arise from the way 
the uncertainties are included (if at all) in the calculations. It seems obvious, 
that sensitivity curves need to be compared at the same "real" significance 
level and at the same "real" probability for discovery, whereas they are usually 
compared for the same nominal significance level and under the assumption 
that the probability for discovery will be always 50 %. 



10 



The toy model presented here is a crude simplification of the actual experimen- 
tal situation where many measurement bins and different types of correlated 
and uncorrelated uncertainties have to be considered. For example, a gener- 
alization of the profiling method to a more realistic experimental situation, 
including many bins and correlated systematic uncertainties, is given in |13j . 
The results presented in figures (U El and figure [3] should therefore rather serve 
as an inspiration for detailed studies of realistic experimental conditions (see 
for example [5]). If possible, the statistical quantities should be studied using 
Monte Carlo simulations of many replica of the experiment under study. Ap- 
plication of Monte Carlo simulations does not only yield information on the 
correct false detection rates but also on the probability of detection, which is 
of obvious importance for assessment of sensitivities of future experiments. 
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